Design of siRNA molecules for silencing of membrane glycoprotein, nucleocapsid phosphoprotein, and surface glycoprotein genes of SARS-CoV2

The global COVID-19 pandemic caused by SARS-CoV2 infected millions of people and resulted in more than 4 million deaths worldwide. Apart from vaccines and drugs, RNA silencing is a novel approach for treating COVID-19. In the present study, siRNAs were designed for the conserved regions targeting three structural genes, M, N, and S, from forty whole-genome sequences of SARS-CoV2 using four different software, RNAxs, siDirect, i-Score Designer, and OligoWalk. Only siRNAs which were predicted in common by all the four servers were considered for further shortlisting. A multistep filtering approach has been adopted in the present study for the final selection of siRNAs by the usage of different online tools, viz., siRNA scales, MaxExpect, DuplexFold, and SMEpred. All these web-based tools consider several important parameters for designing functional siRNAs, e.g., target-site accessibility, duplex stability, position-specific nucleotide preference, inhibitory score, thermodynamic parameters, GC content, and efficacy in cleaving the target. In addition, a few parameters like GC content and dG value of the entire siRNA were also considered for shortlisting of the siRNAs. Antisense strands were subjected to check for any off-target similarities using BLAST. Molecular docking was carried out to study the interactions of guide strands with AGO2 protein. A total of six functional siRNAs (two for each gene) have been finally selected for targeting M, N, and S genes of SARS-CoV2. The siRNAs have not shown any off-target effects, interacted with the domain(s) of AGO2 protein, and were efficacious in cleaving the target mRNA. However, the siRNAs designed in the present study need to be tested in vitro and in vivo in the future. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-022-00346-z.

With the onset of the global COVID-19 pandemic, several research laboratories and biopharmaceutical companies have been actively engaged in the development of vaccines for COVID-19. A few vaccines have been approved and rolled out for public usage in this line. However, the emergence of new variants of SARS-CoV2 across the globe has resulted in the vaccines' reduced efficacy [4]. Besides this, shortfalls in the supply of vaccines against the demand are another constraint [5]. In these cases, there is a need for the development of alternate strategies for treating the patients infected with SARS-CoV2 to ensure that a greater number of successful candidate molecules are available for combating COVID-19. One such measure is the development of short interfering RNA (siRNA) as an antiviral agent.
A messenger RNA (mRNA) is cleaved by a singlestranded RNA known as short interfering RNA (siRNA). The process is known as RNA interference (RNAi). In this process, a double-stranded RNA is cleaved by a ribonuclease III-like enzyme known as Dicer, resulting in the formation of a duplex consisting of 21-23 nucleotides. One strand is known as a sense (passenger) strand, and another is known as an antisense (guide) strand. Before the inhibition of target mRNA, the duplex is loaded into an RNA-induced silencing complex (RISC). Subsequently, the passenger strand is lost, and the guide strand pairs with the mRNA by complementary base pairing and results in cleavage of the target mRNA [6,7].
In the present study, functional siRNAs were designed in silico for targeting the structural genes, viz. M, N, and S in the genome of SARS-CoV2. This paves the way forward for further studies in vitro and in vivo for the shortlisting of the effective siRNA candidates designed in the present study for usage by humans for combating COVID-19.

Retrieval of nucleotide sequences from NCBI -nucleotide database
In the present study, forty whole-genome sequences of SARS-CoV2 were retrieved from the nucleotide database of the National Centre for Biotechnology Information (NCBI) (https:// www. ncbi. nlm. nih. gov/ nucle otide/) ( Table 1). Nucleotide sequences corresponding to M, N, and S genes were used in the present study as targets for the design of potential siRNA.

Multiple sequence alignment
Multiple sequence alignment of the respective gene sequences was carried out using Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/) for the identification of the conserved gene sequences.
Nucleotide sequences of the forty accessions corresponding to a particular gene were aligned using Clustal Omega. After alignment, stretches of nucleotide sequences without mismatches were identified and considered as conserved regions. The conserved regions were represented by an asterisk in the output obtained from the Clustal Omega. These regions were used for the design of siRNAs (Supplementary Tables 1, 2 and 3). Those conserved regions which were less than 73 nucleotides long were not considered for further analyses.

Design of siRNAs by siDirect
The algorithm of siDirect v2.0 selects functional siR-NAs which follow the first three out of four rules specified by Ui-Tei et al. [34]. Apart from this, there is an option to select siRNAs by choosing the two other algorithms, viz. Reynolds et al. [35] and Amarzguioui and Prydz [6]. In the present study, the combined rule "Ui-Tei (U) + Reynolds (R) + Amarzguioui (A)" was chosen for the selection of siRNAs. Furthermore, the seed-duplex stability (T m ) value was set to 21.5 °C.

Design of siRNAs by i-score designer
i-Score Designer was used in the present study for the prediction of active siRNAs. It works on an algorithm "i-Score" (inhibitory score) based on a linear regression model [9]. The conserved region sequences corresponding to the target genes of SARS-CoV2 were given as an input to i-Score Designer. In the i-Score designer, siRNAs were ranked according to i-Score which ranges from 100 to 0. Apart from calculating the i-Score and rank, the i-Score designer also calculates (i) ΔG value of the secondary structure of a siRNA strand, (ii) dinucleotide ΔG value at 5′ and 3′ ends, (iii) ΔG value for the entire siRNA (iv) number of GC stretches (v) % GC content (vi) scores of Ui-Tei, Amarzguioui, Hsieh, Takasaki, Reynolds, and Katoh, and (vii) scores and ranks of s-Biopredsi and DSIR.

Design of siRNAs by OligoWalk
OligoWalk online server [33] was utilized in the present study for predicting the efficient siRNAs at default parameters. The output of the program displays the predicted siRNAs along with the values for "probability of being an efficient siRNA" for each of the siRNAs. Besides probability values, the program also gives energy values (kcal/mol) for various parameters, viz. overall, duplex, break target, intraoligo, interoligo, and end_diff, besides T m -dup (in °C) and prefilter score. siRNAs predicted by OligoWalk have an efficiency of 78.6 % in silencing the target [33].

siRNA scales
siRNA scales web server available at http:// geste land. genet ics. utah. edu/ siRNA_ scales/ was utilized for the screening of the siRNAs predicted by the four tools, viz. RNAxs, siDirect, i-Score Designer, and OligoWalk. The antisense strand sequences were given as an input to the siRNA scales. Those siRNAs whose predicted value of efficacy ranged from 1 to ≤ 30 were considered for further analyses.

Calculation of the free energy of folding of the guide strand
MaxExpect web server (https:// rna. urmc. roche ster. edu/ RNAst ructu reWeb/ Serve rs/ MaxEx pect/ MaxEx pect. html) [36] was used for the prediction of the structure of the guide strand at default parameters (maximum % energy difference, 50; maximum number of structures, 1000; Windows size, 5; gamma, 1). The output of the program displays the structure of the RNA along with its score (also termed as energy). The cutoff value of the free energy of folding was set to 1.5 for shortlisting of the siRNAs in the present study [32].

Calculation of the free energy of binding between the guide strand and the target
DuplexFold web server (https:// rna. urmc. roche ster. edu/ RNAst ructu reWeb/ Serve rs/ Duple xFold/ Duple xFold. html) [37] was used for calculating the free energy of binding of the guide strand with the target at default parameters (maximum % energy difference, 5; maximum number of structures, 20; Window size, 0; maximum loop size, 30; temperature, 310.15 K). Guide-target duplexes with energy values ≤ −30 were shortlisted for further analyses [31,32].

Validation of the efficacy of siRNAs
Validation of the efficacy of the predicted siRNA molecules against their consequence targets was carried out using SMEpred (https:// bioin fo. imtech. res. in/ manojk/ smepr ed/) [38]. siRNAs (i.e., without any chemical modifications) with efficacy ≥ 85 were considered for shortlisting.

Off-target effects of siRNAs
BLAST ® (Basic Local Alignment Search Tool) (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi) was utilized to find out the off-target matches of the siRNAs designed in the present study. Reverse complement of the guide strand sequences was subjected for similarity check. BLAST ® search was carried out with the following parameters: (i) choose search set -human genomic plus transcript (human G + T), models (XM/XP) excluded [28], and (ii) program selectionsomewhat similar sequences (blastn). The remaining parameters were set to their default. Search parameters were automatically adjusted to search for a short input sequence by the program.

Molecular docking of the guide strand and argonaute protein
Molecular docking was carried out between the guide strands of siRNAs and human argonaute-2 protein using the HDOCK server (http:// hdock. phys. hust. edu. cn/). In HDOCK, molecular docking is carried out between the ligand and receptor by using a fast Fourier transform (FFT)-based hierarchical approach [39]. The server builds the three-dimensional (3D) structure of the protein from a given amino acid sequence by searching for a homologous template in the Protein Data Bank (PDB).
In the present study, the amino acid sequence of the human Argonaute-2 protein (AGO2) (PDB ID 4F3T) was retrieved from PDB and given as an input for building the 3D model of the protein for docking. Nucleotide sequences of the guide strand of siRNA molecules were given as an input (as "ligand"). The server generates the three-dimensional model of RNA from the given input nucleotide sequence. The server returns with top 100 predictions along with docking scores, ligand root-mean square deviation (RMSD) values, ranks of the models, etc. The user can download the structures of the docked complexes besides the PDB files of receptor and ligand that were generated by homology modeling. The server also provides the homology quality scores of the receptor and ligand.
In the present study, the docked complexes that were ranked as one were considered for further analyses. The interacting residues of AGO protein with nucleotides of the guide strand of siRNA within 5 Å were reported.

siRNA design workflow strategy
The combinatorial strategy for the design of functional siRNAs in the present study is summarized below: (i) Identification of conserved regions across all the target genes by carrying out multiple sequence alignment (ii) Design of siRNAs for the conserved regions of the three target genes by RNAxs at default parameters (iii) Design of functional siRNAs by siDirect by considering the following parameters: i. Combined rule of the algorithm used for the functional siRNA selection: U + R + A ii. T m = 21.5 °C (maximum) (iv) Design of functional siRNAs by i-Score Designer and selection of siRNAs whose i-Score ≥ 65 (v) Design of functional siRNAs by OligoWalk online server (vi) Pooling of siRNAs that were predicted in common by all the four online servers (vii) Filtering of siRNAs obtained from step (vi) using siRNA scales at a cutoff value of ≤ 30 (viii) Shortlisting of siRNAs obtained from the above step whose dG ≥ −34.6 kcal/mol and GC content in the range from ≥ 31.6 to 53 % (ix) Secondary structure prediction of siRNAs obtained from step (viii) by MaxExpect web server and further shortlisting of siRNAs whose free energy of folding ≥ 1.5 (x) Calculation of the free energy of binding between the antisense strand and the target region by DuplexFold server and further shortlisting of siR-NAs whose energy is ≤ −30 (xi) Prediction of the efficacy of siRNAs in inhibiting the target mRNA obtained from the above step by SMEpred. A cutoff value of ≥ 85 was set for further shortlisting (xii) Study of the off-target matches of siRNAs obtained from the preceding step using BLAST ® (xiii) Molecular docking of the shortlisted guide strands with AGO protein

Results and discussion
In the present study, three structural genes M, N, and S that code for a membrane glycoprotein, nucleocapsid phosphoprotein, and surface glycoprotein, respectively, were used for the design of function siRNAs. This is the first study to screen most of the structural genes in the genome of SARS-CoV2 for the design of functional siR-NAs. Apart from vaccines and drug candidates, target gene silencing by siRNA is an important approach for combating COVID-19. In the present study, four different online tools, viz. RNAxs, siDirect v2.0, i-Score Designer, and OligoWalk, were used for the design of functional siRNAs. Accessibility of siRNA to the target site in mRNA is an essential prerequisite for RNAi. RNAxs, a web server, was designed by considering the target site accessibility as a criterion along with other known siRNA design rules for the prediction of highly functional siRNAs [10]. The number of siRNAs predicted by RNAxs and the range of worst rank (WR) for the three genes were as follows: The number of functional siRNAs predicted by siDirect (those satisfied the laid criteria) for the three genes was as follows: (i) M -2, (ii) N -28, and (iii) S -240 (Supplementary Tables 7, 8 and 9). Only those siRNAs which have satisfied the rules that affect the siRNA activity as prescribed by the three functional siRNA selection algorithms [6,34,35] in any one of the following ways, viz. U, R, A, URA, UR, RA, and UA, were selected.
For the activity of siRNA, a near-perfect [8,40,41] complementary base pairing is required between the guide strand of the siRNA and the target mRNA. However, if the siRNA has complementarity with the nontarget region, it may result in silencing of the nontarget genes; especially, off-target silencing happens if the unintended regions base pair with the seed region of siRNA due to complementarity [8]. To overcome this effect, siDirect selects the guide and passenger strands of siR-NAs whose T m value is below 21.5 °C for reducing the seed-dependent off-target effects [8,42]. This is another unique feature of siDirect. In the present study, siRNAs whose T m value was below 21.5 °C, satisfying the rules prescribed by Ui-Tei et al. [34], Reynolds et al. [35], and Amarzguioui and Prydz [6], was selected.
The next online tool used in the present study for the design of siRNAs was "i-Score Designer, " a second-generation algorithm that adopts a linear regression model for computing the nucleotide preferences at each position on the antisense strand. The tool calculates i-Score for each of the siRNA which is completely based on the nucleotide preferences at each site on a scale of 0-100. The number of functional siRNAs predicted by i-Score Designer for the conserved regions of the three genes was as follows: (i) M -193, (ii) N -712, and (iii) S -2296 (Supplementary Tables 10, 11 and 12). siRNAs whose i-Score ≥ 65 alone were considered for further analyses.
OligoWalk web server was utilized in the present study for the prediction of siRNAs that have an efficiency of more than 70% in silencing the target mRNA. The program was designed by taking into consideration the thermodynamic aspects (the free energy changes of different equilibrium states, viz. unimolecular (ΔG o intra-siRNA ) and bimolecular (ΔG o inter-siRNA ) siRNA folding, unimolecular folding state of mRNA at the siRNA binding region (ΔG o target structure ) in addition to the hybridized state of siRNA, and target mRNA (ΔG o duplex )) and sequence features of siRNA for predicting the efficacy of the siRNA molecule by the help of support vector machine (SVM) [7,33]. The number of siRNAs predicted by the OligoWalk web server for the three genes was as follows: siRNAs that were predicted in common by all the four siRNA prediction tools were further screened using the siRNA scales web server for their efficiency to cleave the target mRNA at a cutoff value of ≤ 30 (except for the "M" gene, where none of the siRNAs designed by siDirect had in common with those predicted by RNAxs, i-Score Designer, and OligoWalk. Hence siRNAs predicted in common by RNAxs, i-Score Designer, and OligoWalk were considered for further analyses). A few antisense strands were eliminated in the present study as their predicted value of efficiency was > 30. siRNA scales were built on a linear regression model by considering the following three parameters: (i) thermodynamic stability (ΔG values) of the first two base pairs and the last two base pairs in the antisense strand of siRNA, (ii) nucleotide preferences at specific positions, and (iii) G + C content [11].
The total number of siRNAs predicted in common by all the four siRNA design tools and further shortlisted by siRNA scales was as follows: (i) M -14, (ii) N -6, and (iii) S -66 (Supplementary Table 16) -(step 1). One of the important parameters about the functionality of a siRNA is the Gibbs free energy (dG) [43]. i-Score Designer calculates the whole dG values of all the predicted siRNAs. Ichihara et al. [9] observed a high correlation coefficient between observed and predicted siRNAs with i-Score when dG values were elevated from −52.0 to −34.6 kcal/mol. A cutoff dG value of ≥ −34.6 kcal/ mol was imposed in the present study for the shortlisting of siRNAs [44]. In addition to this, in the same study, it was found that a median number of 65 active siRNAs per mRNA from the human RefSeq database was predicted when i-Score and dG values were > 65 and ≥ −34.6, respectively [9]. As a result, siRNAs designed by the i-Score designer, whose i-Score and dG values ≥ 65 and ≥ −34.6, respectively, were shortlisted for further analyses.
Another important feature concerning the functionality of siRNA is the percentage content of GC. Higher GC content hinders the unwinding of the duplex siRNA. Lower GC content may lead to weak interactions between the antisense (guide) strand of siRNA and target mRNA [6]. Amarzguioui and Prydz [6] suggested the GC range of 31.6-57.9% to be optimal for functionality. However, optimal % GC content was varying from one study to another. For example, Chalk et al. [45] suggested % GC content from 36 to 53 to be effective, whereas Elbashir et al. [46] suggested % GC content from 32 to 79 to be effective [13]. In the present study, the optimal GC content was set in the range of 31.6-53%. In particular, those siRNAs whose % GC < 31.6 were eliminated.    The number of siRNAs obtained from step 1 whose dG ≥ −34.6 kcal/mol and GC content in the range from 31.6 to 53% was 30 (M gene -7, N gene -4, and S gene -19) (Supplementary Table 17) (step 2).
MaxExpect web server was utilized in the present study for the prediction of the secondary structure of the antisense strands of the shortlisted siRNAs. The server generates the structure(s), in which the base pairs have the highest possibility of being accurate [36]. siRNAs whose free energy of folding ≥ 1.5 were shortlisted for further analyses [32]. The number of siRNAs obtained from step 2 that has passed the set criterion (energy ≥ 1.5) for M, N, and S genes was 7, 4, and 19, respectively (Supplementary Table 18), (Supplementary Figs. S1 a-c), (Fig. 1 a-f ), (step 3).
DuplexFold web server was used in the present study for calculating the free energy of folding between the guide strand and the target region. The server generates the structure between two RNA strands that has the lowest free energy state without allowing the intramolecular base pairs formation [37]. Those siRNAs whose free energy of folding values ≤ −30 were considered for further analyses [30,32]. The number of siRNAs obtained from step 3, which have passed the set criterion (energy ≤ −30) for M, N, and S genes was 5, 4, and 16, respectively (Supplementary Table 19), ( Supplementary  Fig. S2 a-c and Fig. 2 a-f -Step 4).
SMEpred is a SVM-based method for predicting the efficacy of normal siRNAs as well as chemically modified siRNAs that are designed from a given mRNA or a gene sequence [38]. SMEpred web server was utilized in the present study to further screen the siRNAs obtained from the preceding steps for their efficacy. siRNAs with efficacy scores ≥ 85 were shortlisted for further analyses. The number of siRNAs at this stage for M, N, and S genes was 5, 3, and 13, respectively (step 5).
BLAST ® was used to assess the extent of off-target matches of siRNAs designed in the present study. None of the 21 nucleotides of the guide strand of siRNAs targeting M, N, and S regions matched/aligned with the subject sequences in the human G + T database. The minimum to maximum identity ranged from 12/12 to 19/21, respectively. The identity of 19/21 had a very high expect value of 184, which is not significant at the default expect (E) threshold value of 0.05. Only S31.1 siRNA had the identity    [28]. The number of siRNAs at this stage for M, N, and S genes was 5, 3, and 13, respectively (Supplementary Tables 20, 21 and 22) (step 6). The top 2 siRNAs obtained from step 6 for each of the three regions in terms of the SMEpred score were finalized and were used for molecular docking studies (Tables 2, 3 and 4).
The sense strand sequences of the six shortlisted siRNAs (Tables 2, 3 and 4) were screened for on/off-target similarities against the forty whole-genome sequences of SARS-CoV2. It was found that all the six shortlisted siRNAs showed an on-target effect (i.e., matched with the intended regions), and none of them showed off-target matches.
The human AGO proteins consist of four functional domains, namely N-terminal domain (N), PIWI/ Argonaute/Zwille (PAZ) domain, MID domain, and P-element-induced wimpy tested (PIWI) domain. The cap-binding-like domain (MC) is found within the MID domain [47,48]. The PIWI domain in AGO2 and 3 contains four amino acids: aspartic acid (D), glutamic acid (E), aspartic acid (D), and histidine (H) known as "catalytic tetrad, " which is essential for the cleavage of the mRNA. The PAZ and MID domains bind with the 3′ and 5′ end of the guide strand, respectively [47]. In the present study, the docked complex with the lowest binding energy, i.e., model 1 among the output complexes, was considered to be the best one. Besides considering the binding score, the domains with which RNA interacted were also studied. This was known by looking at the interacting residues of the AGO protein with the guide strand of siRNA (Supplementary Table 23). The amino acid sequence positions of the PAZ and PIWI domains were known from the UniProt database by giving the PDB ID of the protein that was used as a template by HDOCK.
[30] and Shawan et al. [31]. siRNA N10.1 has spread across the domains in the docked complex when compared with the remaining siRNAs (Fig. 8). All the siRNAs in the docked complexes were found to interact with the PIWI domain. Concerning the PAZ domain, only N10.1 and S28.5 siRNAs were found to show interaction. Among the siRNAs designed for S, M, and N genes, S28.5, M8.3, and N11.2 have the lowest (more negative value) docking scores of −356.13, −350.68, and −347.70, respectively (Supplementary Table 23). Although only N10.1 and S28.5 siRNAs have interacted with both PAZ and PIWI domains, the remaining siRNAs are efficacious in terms of the SMEpred score. More or less a similar situation was also observed in the study of Chowdhury et al. [30], where siRNAs from cluster 2 had fewer interactions with AGO2 protein in the docked complex compared to cluster 1. However, siRNAs from cluster 2 outperformed in terms of siRNAPred scores by showing better efficacy than the g15 siRNA from cluster 1, which is the best candidate (g15) in terms of molecular interactions with the AGO2 protein.
Apart from designing functional siRNAs in silico, efficient delivery of siRNAs is essential for achieving the target gene silencing. Different methods for in vivo delivery of siRNAs are available from Chen et al. [49], Van den Berg et al. [50], and Lundstrom [51]. Also, strategies to reduce off-target gene silencing are outlined in Liu et al. [13], Dar et al. [38], and Chen et al. [49].
In the present study, six siRNAs were shortlisted targeting M, N, and S genes of SARS-CoV2 that were found to be effective based on the outcomes of various tools that were used. However, in the future, siRNAs designed in the present study need to be further tested in vitro and in vivo to confirm their efficacy for the treatment of COVID-19.

Conclusion
siRNA-mediated gene silencing is one of the promising approaches for treating COVID-19. In this study, a total of six different functional siRNAs targeting three structural genes M, N, and S (two siRNAs for each of the three genes) were designed and tested for their efficacy using different online tools, along with molecular docking studies. The efficiency in the design of siRNAs against SARS-CoV2 was maximized in the present study by using diverse online tools and subsequent shortlisting of the designed siRNAs based on a few important parameters. All the six siRNAs designed in the present study were found to be effective in inhibiting the target. Experimental validations of siRNAs designed in the present study need to be carried out in the future.